我们介绍了DeepNash,这是一种能够学习从头开始播放不完美的信息游戏策略的自主代理,直到人类的专家级别。 Stratego是人工智能(AI)尚未掌握的少数标志性棋盘游戏之一。这个受欢迎的游戏具有$ 10^{535} $节点的巨大游戏树,即,$ 10^{175} $倍的$倍于GO。它具有在不完美的信息下需要决策的其他复杂性,类似于德克萨斯州Hold'em扑克,该扑克的游戏树较小(以$ 10^{164} $节点为单位)。 Stratego中的决策是在许多离散的动作上做出的,而动作与结果之间没有明显的联系。情节很长,在球员获胜之前经常有数百次动作,而Stratego中的情况则不能像扑克中那样轻松地分解成管理大小的子问题。由于这些原因,Stratego几十年来一直是AI领域的巨大挑战,现有的AI方法几乎没有达到业余比赛水平。 Deepnash使用游戏理论,无模型的深钢筋学习方法,而无需搜索,该方法学会通过自我播放来掌握Stratego。 DeepNash的关键组成部分的正则化NASH Dynamics(R-NAD)算法通过直接修改基础多项式学习动力学来收敛到近似NASH平衡,而不是围绕它“循环”。 Deepnash在Stratego中击败了现有的最先进的AI方法,并在Gravon Games平台上获得了年度(2022年)和历史前3名,并与人类专家竞争。
translated by 谷歌翻译
The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.
translated by 谷歌翻译
The increasing complexity of gameplay mechanisms in modern video games is leading to the emergence of a wider range of ways to play games. The variety of possible play-styles needs to be anticipated by designers, through automated tests. Reinforcement Learning is a promising answer to the need of automating video game testing. To that effect one needs to train an agent to play the game, while ensuring this agent will generate the same play-styles as the players in order to give meaningful feedback to the designers. We present CARMI: a Configurable Agent with Relative Metrics as Input. An agent able to emulate the players play-styles, even on previously unseen levels. Unlike current methods it does not rely on having full trajectories, but only summary data. Moreover it only requires little human data, thus compatible with the constraints of modern video game production. This novel agent could be used to investigate behaviors and balancing during the production of a video game with a realistic amount of training time.
translated by 谷歌翻译
Modern video games are becoming richer and more complex in terms of game mechanics. This complexity allows for the emergence of a wide variety of ways to play the game across the players. From the point of view of the game designer, this means that one needs to anticipate a lot of different ways the game could be played. Machine Learning (ML) could help address this issue. More precisely, Reinforcement Learning is a promising answer to the need of automating video game testing. In this paper we present a video game environment which lets us define multiple play-styles. We then introduce CARI: a Configurable Agent with Reward as Input. An agent able to simulate a wide continuum range of play-styles. It is not constrained to extreme archetypal behaviors like current methods using reward shaping. In addition it achieves this through a single training loop, instead of the usual one loop per play-style. We compare this novel training approach with the more classic reward shaping approach and conclude that CARI can also outperform the baseline on archetypes generation. This novel agent could be used to investigate behaviors and balancing during the production of a video game with a realistic amount of training time.
translated by 谷歌翻译
In Novel Class Discovery (NCD), the goal is to find new classes in an unlabeled set given a labeled set of known but different classes. While NCD has recently gained attention from the community, no framework has yet been proposed for heterogeneous tabular data, despite being a very common representation of data. In this paper, we propose TabularNCD, a new method for discovering novel classes in tabular data. We show a way to extract knowledge from already known classes to guide the discovery process of novel classes in the context of tabular data which contains heterogeneous variables. A part of this process is done by a new method for defining pseudo labels, and we follow recent findings in Multi-Task Learning to optimize a joint objective function. Our method demonstrates that NCD is not only applicable to images but also to heterogeneous tabular data.
translated by 谷歌翻译
决策对于自动驾驶的车道变化至关重要。强化学习(RL)算法旨在确定各种情况下的行为价值,因此它们成为解决决策问题的有前途的途径。但是,运行时安全性较差,阻碍了基于RL的决策策略,从实践中进行了复杂的驾驶任务。为了解决这个问题,本文将人类的示范纳入了基于RL的决策策略中。人类受试者在驾驶模拟器中做出的决定被视为安全的示范,将其存储到重播缓冲液中,然后用来增强RL的训练过程。建立了一个复杂的车道变更任务,以检查开发策略的性能。仿真结果表明,人类的演示可以有效地提高RL决策的安全性。而拟议的策略超过了其他基于学习的决策策略,就多种驾驶表演而言。
translated by 谷歌翻译
超声波术提供廉价,广泛可接近和紧凑的医疗成像解决方案。然而,与其他成像方式相比,例如CT和MRI,超声图像臭名昭着地遭受强大的散斑噪声,其源自子波长散射的随机干扰。这恶化了超声图像质量并使解释具有挑战性。我们在此提出了一种基于从高质量MRI图像中学到的深生成前的最大-A-Bouthiori估计的新的无监督超声斑点和图像去噪方法。为了模拟生成组织反射率,我们利用标准化流量,近年来已经表现出在各种应用中建模信号前沿的强大。为了促进拓展,我们将先前和培训我们的流量模型从NYU FastMri(完全采样)数据集的补丁上。然后将该之前用于迭代去噪方案的推理。我们首先验证我们在嘈杂的MRI数据(无前域移位)上的学习前沿的实用程序,然后转向从PICMU和CUBDL数据集的模拟和体内超声图像上的评估性能。结果表明,该方法优于定量和定性的其他(无监督)超声的去噪方法(NLM和OBNLM)。
translated by 谷歌翻译
通过实现灵活的按需系统,预计自动车辆(AVS)将增加交通安全和交通效率等。这在新加坡尤其重要,是世界上最稠密的国家之一,这就是为什么新加坡当局目前正在积极促进AVS的部署。但是,由于正式的AV路公路审批程序所需所需。为此,提出了一种安全评估框架,这与基于交通场景的方法相结合了标准化功能安全设计方法的方面。后者涉及使用驱动数据来提取AV相关的流量方案。底层方法基于将场景分解为基本事件,随后的场景参数化和采样场景参数的估计概率密度函数来创建测试场景。随后,由此产生的测试场景用于模拟环境中的虚拟测试,并在证明地面和现实生活中进行物理测试。结果,所提出的评估管线因此由于基于模拟的方法而在相对短的时间帧中为AV性能提供统计相关和定量措施。最终,拟议的方法提供了AVS的正式道路审批程序的当局。特别是,拟议的方法将支持新加坡土地运输机构进行AVS的道路批准。
translated by 谷歌翻译
迄今为止,迄今为止,众所周知,对广泛的互补临床相关任务进行了全面比较了医学图像登记方法。这限制了采用研究进展,以防止竞争方法的公平基准。在过去五年内已经探讨了许多新的学习方法,但优化,建筑或度量战略的问题非常适合仍然是开放的。 Learn2reg涵盖了广泛的解剖学:脑,腹部和胸部,方式:超声波,CT,MRI,群体:患者内部和患者内部和监督水平。我们为3D注册的培训和验证建立了较低的入境障碍,这帮助我们从20多个独特的团队中汇编了65多个单独的方法提交的结果。我们的互补度量集,包括稳健性,准确性,合理性和速度,使得能够独特地位了解当前的医学图像登记现状。进一步分析监督问题的转移性,偏见和重要性,主要是基于深度学习的方法的优越性,并将新的研究方向开放到利用GPU加速的常规优化的混合方法。
translated by 谷歌翻译
本文提出了一种自动创建变量(在回归的情况下)的方法,该方法补充了初始输入向量中包含的信息。该方法是一个预处理步骤,其中将要回归的变量的连续值离散为一组间隔,然后将其用于定义值阈值。然后,对分类器进行训练,以预测要回归的值小于或等于这些阈值中的每个阈值。然后,将分类器的不同输出以额外的变量向量的形式串联,以丰富回归问题的初始向量。因此,实施的系统可以被视为通用预处理工具。我们用5种类型的回归器测试了提出的富集方法,并在33个回归数据集中对其进行了评估。我们的实验结果证实了该方法的兴趣。
translated by 谷歌翻译